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Abstract. This paper proposes a simple but effective graph-based agglomerative 
algorithm, for clustering high-dimensional data. We explore the different roles of 
two fundamental concepts in graph theory, indegree and outdegree, in the con- 
text of clustering. The average indegree reflects the density near a sample, and 
the average outdegree characterizes the local geometry around a sample. Based 
on such insights, we define the affinity measure of clusters via the product of 
average indegree and average outdegree. The product-based affinity makes our 
algorithm robust to noise. The algorithm has three main advantages: good per- 
formance, easy implementation, and high computational efficiency. We test the 
algorithm on two fundamental computer vision problems: image clustering and 
object matching. Extensive experiments demonstrate that it outperforms the state- 
of-the-arts in both applications!] 



03 1 Introduction 



Many problems in computer vision involve clustering. Partitional clustering, such as 
fc-means (U, determines all clusters at once, while agglomerative clustering 1 1 1 begins 
with a large number of small clusters, and iteratively selects two clusters with the largest 
affinity under some measures to merge, until some stopping condition is reached. Ag- 
glomerative clustering has been studied for more than half a century, and used in many 
applications JT], because it is conceptually simple and produces an informative hierar- 
chical structure of clusters. 

Classical agglomerative clustering algorithms have several limitations flj, which 
have restricted their wider applications in computer vision. The data in computer vi- 
sion applications are usually high dimensional. The distributions of data clusters are 
often in different densities, sizes, and shapes, and form manifold structures. In addition, 
there are often noise and outliers in data. The conventional agglomerative clustering 
algorithms, such as the well-known linkage methods 0], usually fail to tackle these 
challenges. As their affinities are directly computed using pairwise distances between 
samples and cannot capture the global manifold structures in high-dimensional spaces, 

1 The code and supplemental materials are publicly available at 

http : / / mmla b . ie . cuhk . edu ■ h k/ rese arch/ gdl/1 
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Fig. 1. (a) Indegree can be use to detect the change of densities. The density in Cluster a is high, 
and the density in Cluster b is low. The vertices inside Cluster a are strongly connected, but there 
is no outedge to vertices outside Cluster a. So the indegree of k from Cluster a is nonzero, while 
the indegree of i (a vertex in Cluster b) and j (an outlier) from Cluster a are zero. If an undirected 
graph is considered without separating indegrees and outdegrees, both i and j have the same 
degree from Cluster a as k. (b) The product of the indegree and outdegree is an affinity measure 
robust to noisy edges between the two clusters. Under this measure, Cluster a and Cluster b have 
a zero affinity, i.e., the sum of product of indegree and outdegree for all vertices is 0, and thus 
they are separated well. 



these algorithms have problems of clustering high-dimensional data, and are quite sen- 
sitive to noise and outliers UJ. 

To tackle these problems, we propose a simple and fast graph-based agglomerative 
clustering algorithm. The graph representation of data has been extensively exploited 
in various machine learning topics [2 3 4 5 6], but has rarely been utilized in agglom- 
erative clustering. Our algorithm builds A'-nearest-neighbor (A-NN) graphs using the 
pairwise distances between samples, since studies JH show the effectiveness of using 
local neighborhood graphs to model data lying on a low-dimensional manifold embed- 
ded in a high-dimensional space. 

We use the indegree and outdegree, fundamental concepts in graph theory, to char- 
acterize the affinity between two clusters. The outdegree of a vertex to a cluster mea- 
sures the similarity between the vertex and the cluster. If many of the A'-NNs of the 
vertex belong the cluster, the outdegree is large. The outdegree can capture the mani- 
fold structures in the high dimensional space. The indegree of a vertex from a cluster 
reflects the density near the vertex. It is effective for detecting the change of densities, 
which often occurs at the boundary of clusters. Therefore, we use it to separate clusters 
close in space but different in densities, and also reduce the effect of noise. An example 
is shown in Fig. Ola)- To our best knowledge, properties of the indegree and outdegree 
have not been explored by any existing clustering algorithm, although they were suc- 
cessfully applied in analysis of complex networks such as World Wide Web |9j and 
social networks HI 01 and showed interesting results. 
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Fig. 2. Results of different clustering algorithms on a synthetic multiscale dataset. Our algorithm 
can perfectly discover the three clusters with different shapes, sizes, and densities. The output 
clusters are shown in color (best viewed on screen). 



Our affinity measure between two clusters is defined as follows. First, the structural 
affinity from a vertex to a cluster is defined via the product of the average indegree from 
the cluster and average outdegree to the cluster. Intuitively, if a vertex belongs to a clus- 
ter, it should be strongly connected to the cluster, i.e., both its indegree and outdegree 
are large. Otherwise, either the indegree or outdegree is small. Therefore, the product 
of indegree and outdegree can be a good affinity measure (Fig. QIb)). We show that 
the correlation between the inter-cluster indegree and outdegree is weak across differ- 
ent vertices, if the two clusters belong to different ground-truth clusters, using synthetic 
data in Fig. [3] Then, the affinity between two clusters is naturally the aggregated affinity 
measure for all the vertices in the two clusters. 

Our algorithm has three main advantages as follows. 

First of all, it has outstanding performance, especially on noisy data and multiscale 
data (i.e., clusters in different densities). The visual comparisons with linkage meth- 
ods fT|, graph-based average linkage, affinity propagation (AP) |7], spectral clustering 
(SC) y], and directed graph spectral clustering (DGSC) |8 1 on synthetic multiscale data 
are shown in Fig. [2] Noise and multiple scales can degrade the performance of spec- 
tral clustering greatly ifTTI . while the indegree and outdegree in our algorithm detect 
the boundary of scales automaticall)@ and reduce the effect of noise. In Sec. |4j exten- 
sive experiments on real data, including imagery data and feature correspondence data, 
demonstrate its superiority over state-of-the-art methods. These experiments aim at two 
fundamental problems in computer vision, i.e., image clustering and object matching, 
and the results suggest many potential applications of our work. 

Second, it is easy to implement. This affinity measure can be expressed in a ma- 
trix form and implemented with vector additions and inner-products. Therefore, our 
algorithm can be implemented without any dependency on external numerical libraries, 



2 E.g., if cluster a has higher density than cluster b, the boundary of cluster a will have high 
indegree and low outdegree, while the boundary of cluster b will have low indegree and high 
outdegree. 
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such as eigen-decomposition which was extensively employed by many clustering al- 
gorithms J2EH2I . 

Finally, it is very fast. We propose an acceleration method for our algorithm. In 
practice, our algorithm is much faster than spectral clustering H2I3I , especially on large- 
scale data. 



2 Related Work 



The literature dedicated to agglomerative clustering is abundant 111131141 . Linkages |Q], 
e.g., average linkage, define the affinity based on pairwise distances between samples. 
Since pairwise distances do not well capture the global structures of data, these methods 
fail on clustering data with complex structures and are sensitive to noise JT| (see the ex- 
ample in Fig.|2j. Many variants of linkage methods, such as DBSCAN lfl5l . have been 
proposed in the data mining community and show satisfactory performance. However, 
they usually fail to tackle the great challenge from high-dimensional spaces, because 
their sophisticated affinity measures are based on observations from low-dimensional 
data lfT6l. 

Several algorithms 1171181191 has attempted to perform agglomerative clustering 
on the graph representation of data. Chameleon JT7] defines the cluster affinity from 
relative interconnectivity and relative closeness, both of which are based on a min-cut 
bisection of clusters. Although good performance was shown on 2D toy datasets, it 
suffers from high computational cost because its affinity measure is based on a min-cut 
algorithm. Zell [18 1 describes the structure of a cluster via the zeta function and defined 
the affinity based on the structural changes after merging. It needs to compute matrix 
inverse in each affinity computation, so it is much slower than our simple algorithm (see 
Sec. 14. U . Felzenszwalb and Huttenlocher proposed an effective algorithm for image 
segmentation lfl9l . 

Besides agglomerative clustering, Jf-means JT| and spectral clustering |2|3120| are 
among the most widely used clustering algorithms. However, if-means is sensitive to 
the initialization and difficult to handle clusters with varying densities and sizes, or 
manifold shapes. Although spectral clustering can handle the manifold data well, its 
performance usually degrades greatly with the existence of noise and outliers, because 
the eigenvectors of graph Laplacian are sensitive to noisy perturbations |5 |. Affinity 
Propagation J7) explores the intrinsic data structures by message passing among data 
points. Although it performs well on high-dimensional data, it usually requires consid- 
erable run-time, especially when the preference value cannot be manually set. 

Directed graphs have been studied for spectral clustering (e.g., J8j). However, these 
methods symmetrize the directed graph before the clustering task. In contrast, we only 
symmetrize the affinity between two clusters, while keep the directed graph during the 
clustering process. Therefore, our algorithm utilizes more information from the asym- 
metry and is more robust to noisy edges (see Fig.|2]for a comparison between DGSC 
(8) and our algorithm). 
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3 Graph Degree Linkage 
3.1 Neighborhood Graph 

Given a set of samples X = {xi,X2, x„}, we build a directed graph G = (V, E), 
where V is the set of vertices corresponding to the samples in X , and E is the set of 
edges connecting vertices. The graph is associated with a weighted adjacency matrix 
W = [wij], where Wij is the weight of the edge from vertex i to vertex j. Wjj = if 
and only if there is no edge from i to j. 

To capture the manifold structures in high-dimensional spaces, we use the A'-NN 
graph, in which the weights are defined as 




dist(i,j) 



,ifx,G7Vf, 
otherwise, 



where dist(i,j) is the distance between and Xj, A/^ is the set of A'-nearest neigh- 



bors of Xf, and a is set as a = 



K and a are free 



EiUEx^eA/f dist(i,jy 
parameters to be set. In a A-NN graph, there is an edge pointing from Xj to Xj with 
weight w^, if Xj € Af.^ ■ 



3.2 Algorithm Overview 

The graph degree linkage (GDL) algorithm begins with a number of initial small clus- 
ters, and iteratively selects two clusters with the maximum affinity to merge. The affini- 
ties are computed on the A-NN graph, based on the indegree and outdegree of vertices 
in the two clusters. 

The initial small clusters are simply constructed as weakly connected components 
of a A'°-NN graph, where the neighborhood size K° is small, typically as 1 or 2. Then, 
each component is an initial cluster, and each sample is assigned to only one cluster. 

Definition 1 A connected component of an undirected graph is a maximal connected 
subgraph in which any two vertices are connected to each other by paths. 

A weakly connected component of a directed graph is a connected component of the 
undirected graph produced by replacing all of its directed edges with undirected edges. 

The GDL algorithm is presented as AlgorithmQ] with details given in the following 
subsection. 



3.3 Affinity Measure via Product of Indegree and Outdegree 

The affinity measure between two clusters is the key of an agglomerative clustering 
algorithm. Our affinity measure is based on indegree and outdegree in the graph repre- 
sentation. For simplicity, we start from measuring the affinity between a vertex and a 
cluster. 

Indegree and outdegree. Considering a vertex and a cluster, the connectivity be- 
tween them by inedges and outedges can be quantified using the concepts of indegree 
and outdegree. 
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Algorithm 1 Graph Degree Linkage (GDL) 

Input: a set of n samples X = {xi , Xa, ■ • ■ , x n }, and the target number of clusters nr. 
Build the K°-NN graph, and detect its weakly connected components as initial clusters. Denote 
the set of initial clusters as V c = {Ci, • • • , C„ c }, where n c is the number of clusters. 
Build the K-NN graph, and get the weighted adjacency matrix W. 
while n c > tit do 

Search two clusters C a and Cb, such that {C a ,Cb} = argmax Cci c b ev c Ac a ,c b , where 
Ac a ,c b is the affinity measure between C a and Cb, computed using Eq. {5). 
V c <- {V c \ {C a ,C b }} U {C a U C 6 }, and n c = n c - 1. 

end while 

Output: V c . 



Definition 2 Given a vertex i, the average indegree from and the average outdegree 
to a cluster C is defined as deg[ (C) = J2j<£C w ji an d deg^~(C) = t§t Sjec Wi i' 
respectively, where \C \ is the cardinality of set C. 

As we stated in Sec. [U the indegree measures the density near sample i, and the out- 
degree characterizes the K-NN similarity from vertex i to cluster C. We use the size 
of the cluster to normalize the degrees, otherwise, the algorithm may favor of merging 
large clusters instead of merging small clusters with dense connections. We find that in 
practice the normalized degrees work much better than the unnormalized degrees. 

Affinity between a vertex and a cluster. A vertex should be merged to a cluster 
if it is strongly connected to the cluster by both inedges and outedges. Mathematically, 
the correlation of two types of degree is weak, if the vertex and the cluster belong to 
different ground-truth clusters, and strong, otherwise. To verify this intuition, we show 
such statistics on synthetic data in Fig. [3] Therefore, we define the affinity as the product 
of the average indegree and average outdegree, i.e., 

A^c=degr(C)degt(C). (2) 

This affinity is robust to noisy edges between different ground-truth clusters because 
the product can be zero if the inedges and outedges do not coincide. 

Affinity between two clusters. Following the above, we define the asymmetric 
affinity from cluster Cb to cluster C a by summing up with respect to all the vertices in 
C b , i.e., 

A Cb ^c a = A ^c° = E de s7(C a ) deg+(C a ). (3) 
iec b iec b 
Finally, we have the symmetric affinity used in our algorithm as 

A Ca , Cb = A Cb ^c a + A Ca -^c b (4) 

Efficient computation of affinity. Our affinity measure can be computed efficiently 
using the following theorem. 

Theorem 1 The affinity between C a and Cb defined in Eq. @ can be expressed in the 
matrix form 

A Ca .c b = -^lf Cal W Ca , Cb W Cbfia l lCal + J—if Cbl W Cb , Ca W Ca , Cb l lCbl , (5) 
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Fig. 3. To verify the robustness of the product of indegree and outdegree as an affinity measure 
from a vertex i to a cluster C, we compare statistics in two cases: i and C belong to different 
ground-truth clusters, e.g., i € Ci and C = C2 as in (a), and i and C are in the same ground-truth 
cluster, e.g., i 6 Ci and C — Ci as in (b). We see that, in the first case, the product is a quantity 
more robust than the indegree or outdegree. For all i 6 Ci, such that deg^ > or deg^ > 0, the 

mean and proportion of nonzero values (PNZ) of J deg^ deg^ are much smaller than those of 

deg~ and deg^", which implies a small affinity between i and C. Here the square root is for fair 

comparison of the quantities. In contrast, in the second case, the mean and PNZ of yaeg~deg? 

are close to those of deg~ and deg^, which means that the product keeps the large affinity well. 
The correlation of deg~ and deg^ , which is weak in (a) and strong in (b), further verifies the 
effectiveness of our affinity measure for reducing noisy edges across ground-truth clusters and 
keeping edges inside ground-truth clusters. 



where ~Wc a .c b is the submatrix o/W whose row indices correspond to the vertices in 
C and column indices correspond to the vertices in C , i.e., the weights of edges from 
C to Cb, and 1 l is an all-one vector of length L. 

Remark 1 The computation is reduced to vector additions and inner- pro ducts. So, our 
algorithm is easy to implement. 



Proof. It is easy to see that 



1 



de g r(C a ) = j— i; Cal We a ,c b , (6) 

\L a \ L H 

deg+(Co) = ^[W Cb , Ca l| Ca |] i , (7) 

where [v] ; is the i-th element of vector v. Then, by Eq. ©, we can obtain the following 
lemma. 

Lemma 2 

Ac b ^ Ca = j^lje s |Wc.AW Ci ,c l|c |- (8) 
Finally, Theorem[T]can be directly implied by Lemma|2]using Eq. 0}. 

Comparison to average linkage. The GDL algorithm is different from average 
linkage in the following three aspects. First of all, the conventional average linkage is 
based on pairwise distances (TJ. Although we find that average linkage has much better 
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performance on the K-NN graph than pairwise distances, we are unaware of any lit- 
erature which studied the graph-based average linkage algorithm. Second, graph-based 
average linkage simply symmetrizes the directed graph by setting = Wji = (wij + 
Wji ) /2, while our algorithm uses the directed graph. Third, graph-based average linkage 
can be interpreted as defining the affinity measure Ac b ^c a = Si<=Cb [^ e Si~ (C a ) + 
deg^~(C a )]/2 using our indegree-outdegree framework. The sum of the indegree and 
outdegree is not as robust as the product of them to noise. Experimental results in Fig. 
|2]and Sec. 14. 1 I demonstrate the superiority of GDL to graph-based average linkage. 

3.4 Implementations of GDL 

We present two implementations of the GDL algorithm: an exact algorithm via an effi- 
cient update formula and an approximate algorithm called Accelerated GDL (AGDL). 
Both implementations have the time complexity of 0(n 2 ) (see Theorem[3]l. 

Update formula. In each iteration, we select two clusters C a and Cb with the largest 
affinity and merge them as C a b = C a U C&. Then, we need to update the asymmetric 
affinity Ac ab -+c c an d Ac c ->c ab , f° r an y other cluster C c . 

Using Lemma|2] we find that Ac ab ->c c can be computed as follows. 

A Cab ^c c = A Ca ^ Cc + A Cb ^c c • (9) 

By storing all the asymmetric affinities, the update is simple. 

As the same update formula cannot be applied to Ac c -+c ab , we have to compute it 
directly using Eq. dHJ. However, the total complexity is 0(n) in each iteration, due to 
the row sparsity of W (see Sec.|7]in the supplemental materials for details). 

The GDL algorithm with the update formula (GDL-U) is presented as Algorithm^ 
in the supplemental materials. 

Accelerated GDL. Although the GDL-U algorithm is simple and fast, we further 
propose AGDL. The major computational cost is on computing the affinities. To reduce 
the number of affinities computed in each iteration, AGDL maintains a neighbor set 
of size K c for each cluster in V c , to approximate its K c -nearest cluster set. Then, 
finding the maximum affinity among all pairs of clusters can then be approximated by 
searching it in all the neighbor sets. Updating the neighbor sets involves computation of 
the affinity between the new cluster and a small set of clusters, instead of all the other 
clusters. 

Denote the neighbor set of a cluster C as Nc- Initially TVc consists of C's AT c -nearest 
clusters. Once two clusters C a and Cb are merged, we need to update the neighbor sets 
which include C a or Cb, and create the neighbor set of C a U Cb- We utilize two assump- 
tions that (1) if C a or Cb is among the A' c -nearest clusters of C c , C a U Cb is probably 
among the JiT c -nearest clusters of C c ; (2) if C c is among the JiT c -nearest clusters of C a or 
Cb, C c is probably among the AT c -nearest clusters of C a LlCb- So, the new cluster C a UCb 
is added to the neighbor sets which include C a or Cb previously. To create the neighbor 
set for C a U Cb, we select the AT c -nearest clusters from Afc a U Afc b - 

The AGDL algorithm is summarized in Algorithm|3]in the supplemental materials. 
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3.5 Time Complexity Analysis 

We have the following theorem about the time complexity of the GDL, GDL-U and 
AGDL algorithms (please refer to Sec.|7]in the supplemental materials for the proof). 

Theorem 3 

(a) The time complexity of the GDL algorithm (i.e., Algorithm^ is 0(n 3 ). 

(b) The time complexity of the GDL-U algorithm (i.e., Algorithm^ is 0(n 2 ). 

(c) The time complexity of the AGDL algorithm (i.e., Algorithm]^ is 0(n 2 ). 

4 Experiments 

In this section, we demonstrate the effectiveness of GDL and AGDL on image clus- 
tering and object matching. All the experiments are run in MATLAB on a PC with 
3.20GHz CPU and 8G memory. 

4.1 Image Clustering 

We carry out experiments on six publicly available image benchmarks, including object 
image databases (COIL-20 and COIL- 100), hand-written digit databases (MNIST and 
USPS), and facial image databases (Extended Yale-B, FRGC ver2.0)@For MNIST, we 
use all the images in the testing set. For FRGC ver2.0, we use all the facial images in 
the training set of experiment 4. The statistics of all the datasets are presented in Table 
|2] We adopt widely used features for different kinds of images: the intensities of pixels 
as features and Euclidean distance for object and digit images, and local binary patterns 
(LBP) as features and x distance for facial images. 

We compare the GDL-U and AGDL with eight representative algorithms, i.e., k- 
medoids (fc-med) JT], average linkage (Link) (T), graph-based average linkage (G- 
Link), normalized cuts (NCuts) 00, NJW spectral clustering (NJW-SC) 0, directed 
graph spectral clustering (DGSC) JD, self-tuning spectral clustering (STSC) (TT| and 
Zell lfl8l . Here we use fc-medoids instead of fc-means because it can handle the case 
where distances between points are not measured by Euclidean distances. To fairly 
compare the graph-based algorithms, we fix K = 20 and select a with the best perfor- 
mance from the set {10 4 , i G [—2 : 0.5 : 2]} on all the datasets. For our algorithms, the 
parameters are fixed as K° = 1, K° = 10. The numbers of ground-truth clusters are 
used as the input of all algorithms (e.g., tit in our algorithm). 

We adopt the widely used Normalized Mutual Information (NMI) lfl2l to quantita- 
tively evaluate the performance of clustering algorithms. The NMI quantifies the nor- 
malized statistical information shared between two distributions. A larger NMI value 
indicates a better clustering result. 

3 COIL-20 and COIL- 1 00 are from |http : //www, cs ■ Columbia . edu/CAVE/software/| 
MNIST and USPS are from |http : / / www . cs ■ nyu ■ edu/ ~roweis/data . html| Ex- 
tended Yale-B is from http://vision.ucsd. edu /~leekc/ExtYaleDat abase/Ext YaleB 
FRGC ver2.0 is from |http : / / face ■ nist ■ gov/ f rgc/| 

4 The code is downloaded from |http : / / www . cis ■ upenn .edu/~jshi/software/| 
which implements the multiclass normalized cuts algorithm 1201 . 
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Table 1. Quantitative clustering results in NMI on real imagery data. A larger NMI value indicates 
a better clustering result. The results shown in a boldface are significantly better than the others, 
with a significance level of 0.01. 



Dataset 


k-med 


Link 


G-Link 


NCuts 


NJW-SC 


DGSC 


STSC 


Zell 


GDL-U 


AGDL 


COIL-20 


0.710 


0.647 


0.896 


0.884 


0.889 


0.904 


0.895 


0.911 


0.937 


0.937 


COIL- 100 


0.706 


0.606 


0.855 


0.823 


0.854 


0.858 


0.858 


0.913 


0.929 


0.933 


USPS 


0.336 


0.095 


0.732 


0.675 


0.690 


0.747 


0.726 


0.799 


0.824 


0.824 


MNIST 


0.390 


0.304 


0.808 


0.753 


0.755 


0.795 


0.756 


0.768 


0.844 


0.844 


Yale-B 


0.329 


0.255 


0.766 


0.809 


0.851 


0.869 


0.860 


0.781 


0.910 


0.910 


FRGC 


0.541 


0.570 


0.669 


0.720 


0.723 


0.732 


0.729 


0.653 


0.747 


0.746 




log ]() a Noise level o" n 

(a) (b) 

Fig. 4. Variations of performance of different clustering algorithms on the COIL-20 dataset, (a) 
when the parameter a for controlling a in Eq. (TJ changes; (b) when we add Gaussian noise 
J\f(0, a'i) to the images. The NMI differences between a„ = and a n = 160 are 0.048, 0.065, 
0.067, 0.012, for G-Link, NJW-SC, DGSC, and AGDL, respectively. 



The results measured in NMI are given in Table [T] fc-medoids and average link- 
age perform similar, as they heavily rely on the computation of pairwise distances and 
thus are sensitive to noise, and cannot well capture the complex cluster structures in the 
real data sets. NCuts, NJW-SC, and Zell have good performance on most data sets, as 
they capture the underlying manifold structures of the data. STSC works fine on some 
synthetic multiscale datasets in ifTTI but its results are worse than ours on several real 
datasets in comparison. Note that STSC adaptively estimated the parameter a 2 at ev- 
ery point to reflect the variation of local density while ours explores indgree/outdegree 
and fixes a 1 as constant. The effective and robust affinity measure for agglomerative 
clustering makes our GDL-U and AGDL algorithm performs the best among all the 
algorithms. The AGDL's results are nearly the same as GDL-U. 

Compared to other graph-based algorithms, GDL-U and AGDL are more robust to 
the parameter a for building the graph, as well as the noise in the data (see Fig.|Ui. The 
noise added to images can degrade the performance of other algorithms greatly, but our 
performance is barely affected. 

For the graph-based algorithms, we show their time cost in Table |2] AGDL costs 
the least amount of time among all the algorithms. GDL is faster than NCuts, NJW-SC, 
and DGSC, and is much faster than Zell. G-Link, which has worse performance than 
AGDL, is comparable to AGDL on time cost. 
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Table 2. The time cost (in seconds) of the algorithms. The minimum time cost on each dataset is 
in bold. The statistics of each dataset are shown for reference. 



Dataset 


Sample Num 


Cluster Num 


NCuts 


NJW-SC 


DGSC 


Zell 


GDL-U 


AGDL 




1 a /in 




J.osU 




o.zSU 


t J. 11 




U.z / / 


COIL- 100 


7200 


100 


133.8 


239.7 


326.4 


432.9 


12.81 


5.530 


USPS 


11000 


10 


263.0 


461.6 


538.9 


9703 


53.64 


29.01 


MNIST 


10000 


10 


247.2 


384.4 


460.4 


64003 


35.60 


17.18 


Yale-B 
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(a) A pair of composite images, (b) Initial correspondences (c) Detected correspondences 
one of which is warped with (533 inliers in yellow color, by AGDL (532 true, 552 
u n — 50; 1200 outliers in red color, detected, F-score 0.981). 

according to the ground truth); 



Fig. 5. Example of object matching through feature correspondence clustering. 

4.2 Feature Correspondence Clustering for Object Matching 

We show the effectiveness of our clustering algorithm in the presence of outliers via fea- 
ture correspondence clustering. Feature correspondence clustering is commonly used 
for robust object matching 121I14I22L which can deal with geometric distortions of ob- 
jects across images and is a fundamental problem in computer vision. We demonstrate 
that our algorithm can be effectively integrated with the framework of feature corre- 
spondence clustering. Therefore, it has a range of potential applications, such as object 
recognition, image retrieval, and 3D reconstruction. 

We compare with two recent state-of-the-art methods, i.e., agglomerative correspon- 
dence clustering (ACC) flU and graph shift (GS) l22lPI 

Overview of experiments. We follow the experiments in the ACC paper 0*4]. We 
use composite images and their warped versions (Fig. a)) to simulate cluttered scenes 
where deformable objects appear. Then we can use the ground-truth for performance 
evaluation. Namely, we compute the precision and recall rates of detected correspon- 
dences (Fig.[3c)), given a set of correspondences with ground-truth (Fig.[5Jb)). A good 
clustering algorithm can group inliers and separate outliers. It is a more direct way of 
evaluating the performance of clustering algorithms than other experiments, such as 
object recognition. 

5 The code of ACC and GS are downloaded from |http ://cv.snu.ac.kr/research/~acc/| 
and |http :/ / sites . google, com/ site/lhrbss/| respectively. We do not present the 
results of spectral matching (SM) 1211 . because both ACC and GS outperformed SM greatly 
1 141221 . especially when there existed at least two clusters of correspondences according to 
the ground-truth. 
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Fig. 6. Performance comparison of different algorithms. In each sub-figure, one of the three fac- 
tors, i.e., the number of outliers, the level of deformation a n , and the number of common sub- 
images M, is varied, while the other two are fixed as the values appearing at the top. All the 
results are averaged over 30 random trials. 



Experimental settings. We generate a pair of 3 x 3 tiled images that contain M 
common sub-image(s). The common sub-images are randomly selected from the model 
images of the ETHZ toys datasej^, and the non-common sub-images are from test im- 
ages of the same dataset. The positions of all sub-images are randomly determined. 
When M > 1, the common sub-images are chosen as different objects. To simulate de- 
formation, one of the paired images is warped using the thin-plate spline (TPS) model. 
An example of paired test images are shown in Fig. EJa). 9x9 crossing points from 
a 10 x 10 meshgrid on the image are chosen as the control points of the TPS model. 
Then, all the control points are perturbed by Gaussian noise of N(0, er^) independently, 
and the TPS warping is applied based on the perturbations of control points. To obtain 
the candidate correspondences between two tiled images, features are extracted by the 
MSER detector, and the best 3, 000 correspondences are collected according to sim- 
ilarity of the SIFT descriptors. Using the warping model, each correspondence has a 
ground-truth label: true if its error is smaller than three pixels, and false otherwise. Fig. 
|3Jb) shows the correspondences as lines, among which the yellow ones represent true 
correspondences. Then, the performance of different algorithms are quantitatively eval- 



6 |http : / /www . vision .ee.ethz.ch/~calvin/ dataset s . html| 
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uated. We use the F-score, a traditional statistical measure of accuracy, which is defined 
as [precision • recall/ (precision + recall)]. 

Parameters of ACC and GS. As we strictly follow the test protocol in the ACC 
paper fT4l . we use the default parameters in their codes. For GS, we compute the affinity 
matrix W,j = max(/3 — dij/cr 2 sl 0) as the paper [22], where dij is the distance between 
correspondence i and correspondence j as defined in the ACC paper lfl4l . f3, a s and 
other parameters in GS are tuned to be the best. 

Parameters of AGDL. For our AGDL algorithm (i.e., Algorithm^, the parameters 
are fixed as n T = 50, a = 10, K = 35, K° = 2, and K° = 10. We found that the GDL 
works well in a large range of tit, as the number of ground-truth clusters (i.e., M) is 
very small and we can eliminate the outlier clusters by postprocessing^ 

Results. As shown in Fig. [6] we vary the number of outliers, the level of defor- 
mation, and the number of common sub-images, and compare the F-scores of detected 
correspondences by different algorithms. Both ACC and GS perform excellently on this 
task. It is challenging to beat them, which are very recent methods designed specifically 
for object matching. However, our simple clustering algorithm outperforms them. We 
find our AGDL algorithm performs consistently better than both ACC and GS under 
different settings. AGDL has a higher F-score than both in 95.6% of the random trials 
under all the setting combinations. We attribute the success of AGDL to the effective 
cluster affinity measure which is robust to noise and outliers. 

5 Conclusion 

We present a fast and effective method for agglomerative clustering on a directed graph. 
Our algorithm is based on indegree and outdegree, fundamental concepts in graph the- 
ory. The indegree and outdegree have been widely studied in complex networks, but 
have not received much attention in clustering. We analyze their roles in modeling the 
structures of data, and show their power via the proposed graph degree linkage algo- 
rithm. We demonstrated the superiority of this simple algorithm on image clustering and 
object matching. We believe our work provides not only a simple and powerful cluster- 
ing algorithm to many applications in computer vision, but also an insightful analysis 
of the graph representation of data via indegree and outdegree. 
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6 Implementations of GDL 



Algorithm 2 Graph Degree Linkage with the update formula (GDL-U) 

Input: a set of n samples X — {xi , X2, ■ ■ ■ , x n }, and the target number of clusters nr. 
Build the A°-NN graph, and detect its weakly connected components as initial clusters. Denote 
the set of initial clusters as V c = {&, • • • , C„ c }, where n c is the number of clusters. 
Build the K-NN graph, and get the weighted adjacency matrix W. 
Initialize the asymmetric affinity table Ac a ^c b for C a ,Ct 6 V°. 
while n c > tit do 

Search two clusters C a and Cb, such that {C a ,Cb} — argmax c c €VC Ac a ,c b \ 

V c 4r- {V c \ {C a ,C b }} U {C ab }, where C ab = C a U C b , and n c = n c - 1; ' 

For all C c , compute Ac ab ^c c using the update formula, i.e., Eq. and Ac c _^c ab using 

Eq. ®. 
end while 
Output: V c . 



Algorithm 3 Accelerated Graph Degree Linkage (AGDL) 

Input: a set of n sample vectors X = {xi, X2, ■ • ■ , x„}, and the target number of clusters 

TIT • 

Build the A'°-NN graph, and detect its weakly connected components as initial clusters. Denote 
the set of initial clusters as V c = {Ci, ■ ■ ■ , C„ c }, where n c is the number of clusters. 
Build the A'-NN graph, and get the weighted adjacency matrix W. 
Create a neighbor set for each cluster in V c , and initialize it as the A c -nearest cluster set. 
while n c > nr do 

Search two clusters C a and Cb from the affinity of pairs of clusters associated with the 

neighbor sets, such that {C a , C b } = argmax Ca 6JVc or c b eAf Ca Ac a ,c b \ 

V c <r- {V c \ {C a ,Cb}} U {C a b}, where C ab = C a U C b , and n c = n c - 1; 

For all C c , such that C a £ A/c c or Cb G Mc c , add C a b to A/c c , and compute the affinity 

Ac ab fi c \ 

Find the A' c -nearest clusters for C ab in the set Mc a U Nc b , to form Afc ab ; 

Remove C a and C b from the neighbor sets, and remove Mc a and Mc b ■ 
end while 
Output: V c . 



7 Proof of Theorem $ 

Proof. For (a), we analyze the time complexity for each part of the GDL algorithm. 

(1) The directed graph construction has a complexity of at most 0(Kn 2 ) (naive imple- 
mentation). Note that n 3> K, and thus we omit K in the complexities hereinafter. 
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(2) The complexity of constructing initial clusters is 0(n), as the number of edges in 
the graph G is 0(K°n), where K° is 1 or 2. 

(3) In our clustering algorithm, we use an n c x n c table to store the affinities between 
clusters. As the initial clusters are of small sizes, we can assume O(l) complexity 
for computing the affinity between each pair of two clusters. So, it requires a com- 
plexity of O(tIq) to initialize the table, where no is the number of initial clusters 
(n < n). 

(4) In each iteration, it costs O(n^) to find the maximum value in the cluster affinity 
table. To update the cluster affinity table after merging the two clusters with max- 
imum affinity value, we need to compute (n c — 1) affinities, and each affinity is 
computed with complexity of 0(|C a | + \Cb\) using Eq. $5^ (because W is a sparse 
matrix with K nonzero elements in each row). Therefore, the complexity for each 
iteration is at most 0(n c n). 

(5) The number of iterations is (?^o — Jir)- 

By replacing no and n c with their upper bound n, a loose upper bound of the time 
complexity of the GDL algorithm is 0(n 3 ). 

For (b), we can reduce the complexity in each iteration from 0(?vn) in (a) to 0{n). 
We can maintain a table to store the nearest cluster of each cluster^! In each iteration, 
finding the maximum value and updating the table cost approximately 0(n c ). For the 
affinity table, the updating scheme of Ac ab ^c c as m Eq.|9]costs 0(n c ) for all the new 
affinities. To compute Ac c ^c ab , the total complexity for all the new affinities is less 
than the complexity of computing Wc atj ,W,.c a ,, which is 0(nK), as Wc ab .* is K- 
sparse in each row. We ab ,» is the submatrix of W whose row indices correspond to the 
vertices in C a b and column indices are from 1 to n. 

Finally, the total complexity for GDL-U is 0(n 2 ). 

For (c), there are several differences in the AGDL: 

- In (3), we use the neighbor sets of clusters instead of the cluster affinity table. The 
construction of all the neighbor sets costs O(K c n l ). 

- In (4), we need to find the maximum affinity value in the neighbor sets (with com- 
plexity of 0{K c n c ) and compute 0(K C (1 + r)) affinities to update the neighbor 
sets with complexity of 0(K c n)). Because the size of the union of neighbor sets 
of C a and Cb is less than 2K C ), and for real data, we can assume that the number of 
clusters whose neighbor set includes C a or Cb is less than 2tK c , where r is usually 
a small constant close to 1. Therefore, the complexity for each iteration is at most 
O(n). 

So, the time complexity of the AGDL algorithm is (9(n 2 ). 



We can use a heap to achieve better efficiency for this part. But it is not the bottleneck for both 
the complexity analysis and run-time of GDL-U. 
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Table 3. Quantitative clustering results in CE on real imagery data. A smaller CE value indicates 
a better clustering result. The results shown in a boldface are significantly better than the others, 
with a significance level of 0.01 . 



Dataset 


k-med 


Link 


G-Link 


NCuts 


NJW-SC 


DGSC 


STSC 


Zell 


GDL-U 


AGDL 


COIL-20 


0.401 


0.677 


0.213 


0.246 


0.228 


0.201 


0.158 


0.187 


0.142 


0.142 


COIL- 100 


0.570 


0.819 


0.394 


0.462 


0.411 


0.396 


0.391 


0.351 


0.267 


0.269 


USPS 


0.607 


0.874 


0.252 


0.459 


0.354 


0.255 


0.421 


0.332 


0.246 


0.246 


MNIST 


0.577 


0.776 


0.162 


0.405 


0.432 


0.230 


0.305 


0.400 


0.150 


0.150 


Yale-B 


0.728 


0.847 


0.376 


0.273 


0.270 


0.237 


0.205 


0.464 


0.197 


0.197 


FRGC 


0.728 


0.753 


0.664 


0.565 


0.596 


0.595 


0.580 


0.560 


0.548 


0.551 




123456789 10 
Rank 



Fig. 7. The connectivity scores of clusters sorted in descending order. The threshold for separating 
inliers and outliers is shown in a red dash line. 

8 Quantitative Results in Clustering Error for Image Clustering 

The quantitative results, measured in CE 1121 . are given in Table [3] The CE is defined 
as the minimum overall error rate among all possible permutation mappings between 
true class labels and clusters. A smaller CE value indicates a better clustering result. 

9 Outlier Elimination for Object Matching 

For AGDL, we observe that there are many inedges and outedges inside a cluster of 
inliers, while less edges inside a cluster of outliers because outliers are in low density 
regions. Inspired by this, we define the connectivity score of a cluster C as 
Siec [ ( i e Si~('--) + deg^"(C)] ■ We find that there are always large differences between 
the scores of inlier clusters and outlier clusters (see Fig.|7|i. Therefore, we rank the final 
clusters by their connectivity scores. Namely, we sort their scores in descending order, 
and then search the largest gap between two consecutive scores. The set of clusters are 
divided into two subsets without intersection. The subset of clusters with small scores is 
treated as the collection of outliers and removed. For ACC and GS, we use their default 
methods for outlier elimination. 



